Automatic Generation of Natural Language Summaries

نویسنده

  • Dimitrios Galanis
چکیده

Automatic text summarization has gained much interest in the last few years, since it could, at least in principle, make the process of information seeking in large document collections less tedious and time-consuming. Most existing summarization methods generate summaries by initially extracting the sentences that are most relevant to the user’s query from documents returned by an information retrieval engine. In this thesis, we present a new competitive sentence extraction method that assigns relevance scores to the sentences of the texts to be summarized. Coupled with a simple method to avoid selecting redundant sentences, the resulting summarization system achieves state-of-the-art results on widely used benchmark datasets. Moreover, we propose two novel sentence compression methods, which rewrite a source sentence in a shorter form, retaining the most important information. The first method produces extractive compressions, i.e., it only deletes words, whereas the second one produces abstractive compressions, i.e., it also uses paraphrasing. Experiments show that the extractive method generates compressions better or comparable, in terms of grammaticality and meaning preservation, to those produced by state-of-theart systems. On the other hand, the abstractive method produces more varied (due to paraphrasing) and slightly shorter compressions than the extractive one. In terms of grammaticality and meaning preservation, the two methods have similar scores. Finally, we propose an optimization model that generates summaries by jointly se-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Natural Language Generation to Indicative Summarization

The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implement...

متن کامل

An Efficient Text Summarizer using Lexical Chains

We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the origi...

متن کامل

Génération de résumés par abstraction complète

This Ph.D. thesis is the result of several years of research on automatic text summarization. Three major contributions are presented in the form of published and yet to be published papers. They follow a path that moves away from extractive summarization and toward abstractive summarization. The first article describes the HexTac experiment, which was conducted to evaluate the performance of h...

متن کامل

A Hybrid Approach to Multi-document Summarization of Opinions in Reviews

We present a hybrid method to generate summaries of product and services reviews by combining natural language generation and salient sentence selection techniques. Our system, STARLET-H, receives as input textual reviews with associated rated topics, and produces as output a natural language document summarizing the opinions expressed in the reviews. STARLET-H operates as a hybrid abstractive/...

متن کامل

From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?

Most previous studies on meeting summarization have focused on extractive summarization. In this paper, we investigate if we can apply sentence compression to extractive summaries to generate abstractive summaries. We use different compression algorithms, including integer linear programming with an additional step of filler phrase detection, a noisychannel approach using Markovization formulat...

متن کامل

Generating Natural Language Summaries for Multimedia

In this paper we introduce an automatic system that generates textual summaries of Internet-style video clips by first identifying suitable high-level descriptive features that have been detected in the video (e.g. visual concepts, recognized speech, actions, objects, persons, etc.). Then a natural language generator is constructed using SimpleNLG to compile the high-level features into a textu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012